Fault Tolerance: Design and Exploratory Ideas
نویسنده
چکیده
In this article we describe the fault-tolerance architecture of the PerDiS platform. This architecture results from the work that was done in the rst six months at INESC with strong interaction with the other partners of the project (mainly INRIA-SOR and INRIA-SIRAC). We describe the overall fault-tolerance architecture and its integration within PerDiS, the interfaces and implementation provided in the preliminary platform, and the aspects that we intend to explore in the next months so we can support them in the intermediate and advanced platforms. 1 Introduction Cooperative engineering requires fault-tolerance software. Even in a local network , crashes and communication failures occur with non-negligible frequency. Such partial failures may cause inconsistencies in applications, with unpredictable results. Worse still, careless application of corrective measures may aggravate the inconsistencies rather than x them. In the presence of faults, the platform mechanisms must remain safe, and long-running applications should be able to make progress. Our fault-tolerance objectives are to: (i) reliably store persistent data on backing storage; (ii) replicate backing storage, and ensure consistency between replicas ; (iii) ensure transac-tional properties; (iv) support tentative updates; (v) provide checkpointing. Due to the nature of concurrent engineering work in a large-scale environment, data is heavily shared but update connicts are relatively uncommon. Optimistic transaction models are appropriate in this environment and are also well adapted to slow and unreliable network links. The fault-tolerance architecture described in this article takes into account the aspects mentioned above and resulted from the work done at INESC during the rst six months of the PerDiS project. This design was done with strong interaction with the other partners (mainly INRIA-SOR and INRIA-SIRAC).
منابع مشابه
Novel Defect Terminolgy Beside Evaluation And Design Fault Tolerant Logic Gates In Quantum-Dot Cellular Automata
Quantum dot Cellular Automata (QCA) is one of the important nano-level technologies for implementation of both combinational and sequential systems. QCA have the potential to achieve low power dissipation and operate high speed at THZ frequencies. However large probability of occurrence fabrication defects in QCA, is a fundamental challenge to use this emerging technology. Because of these vari...
متن کاملAn approach to fault detection and correction in design of systems using of Turbo codes
We present an approach to design of fault tolerant computing systems. In this paper, a technique is employed that enable the combination of several codes, in order to obtain flexibility in the design of error correcting codes. Code combining techniques are very effective, which one of these codes are turbo codes. The Algorithm-based fault tolerance techniques that to detect errors rely on the c...
متن کاملChallenging Malicious Inputs with Fault Tolerance Techniques
Most of the attacks attempt an initial activation which means the first occurrence of an error provoked by the fault. If its unable to stop the propagation, a fault will be transformed into a failure, causing consequences. This paper presents an exploratory research about the integration of fault tolerance aiming defenses against malicious inputs. When a fault occurs, these techniques provide m...
متن کاملFault Tolerant Rings: Creation and Maintenance
Numerous algorithms have been provided for maintaining the topology of a ring network in a fault-free environment. But a simple, fast and effective fault-tolerant algorithm has been lacking in this field so far. In this report, we seek to address this problem and provide a detailed design to solve the problem in the general case. As our detailed analysis proves, our algorithms function very wel...
متن کاملFault Tolerance in Mobile ad hoc Network: A Survey
Fault-Tolerance is an important design issue to construct a reliable mobile ad hoc network. Many types of faults may occur in mobile network such as link failure, node failure, misbehaving nodes, network failure, power and energy consumption etc. This paper mainly aims in surveying the research articles by combining them together to find the fault tolerant research problem in mobile ad hoc netw...
متن کامل